Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cython perf simplification #861

Merged
merged 14 commits into from
Nov 26, 2024
Merged

Cython perf simplification #861

merged 14 commits into from
Nov 26, 2024

Conversation

zerothi
Copy link
Owner

@zerothi zerothi commented Nov 6, 2024

  • Closes Pk on complex matrices #104
  • Added tests for new/changed functions?
  • Documentation for functionality in docs/
  • Changes documented in CHANGELOG.md

Here are some details that needs to be sorted out before merging:

  • Adding tests for complex spin-orbit calculations, this includes post-processing like PDOS etc.
    The idea is that instead of 8 numbers, we can now do 4, [uu, dd, ud, du], this has the problem of
    overlap being stored as a complex number, but that might be ok.
  • Document the SOC TB handling a bit better, right now the order of elements are not intuitive.

@zerothi
Copy link
Owner Author

zerothi commented Nov 6, 2024

@pfebrer you won't like this...

Sadly, pure-python mode does not allow cimport statements.

This code-restructuring is a first step to reduce the amount of Cython code.

@@ -88,13 +88,16 @@
# import the common options used
from ._common import *

from ._core import *

Check notice

Code scanning / CodeQL

'import *' may pollute namespace Note

Import pollutes the enclosing namespace, as the imported module
sisl._core
does not define '__all__'.
Copy link

codecov bot commented Nov 6, 2024

Codecov Report

Attention: Patch coverage is 85.10638% with 56 lines in your changes missing coverage. Please review.

Project coverage is 86.97%. Comparing base (8559eda) to head (6fedebb).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/sisl/physics/sparse.py 44.18% 48 Missing ⚠️
src/sisl/io/siesta/_help.py 92.85% 6 Missing ⚠️
src/sisl/io/siesta/binaries.py 93.54% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #861      +/-   ##
==========================================
- Coverage   86.98%   86.97%   -0.01%     
==========================================
  Files         404      405       +1     
  Lines       52638    52792     +154     
==========================================
+ Hits        45785    45916     +131     
- Misses       6853     6876      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zerothi zerothi force-pushed the cython-perf-simplification branch 2 times, most recently from 23ab3b7 to 2a5ec18 Compare November 7, 2024 09:30
@pfebrer
Copy link
Contributor

pfebrer commented Nov 7, 2024

Sadly, pure-python mode does not allow cimport statements.

It does, see here:

import cython
import cython.cimports.numpy as cnp
import numpy as np
from cython.cimports.libc.math import sqrt

Do you mean that it doesn't allow to cimport sisl files?

@zerothi
Copy link
Owner Author

zerothi commented Nov 7, 2024

Sadly, pure-python mode does not allow cimport statements.

It does, see here:

import cython
import cython.cimports.numpy as cnp
import numpy as np
from cython.cimports.libc.math import sqrt

Do you mean that it doesn't allow to cimport sisl files?

True, I misunderstood this here...

You're right, but for this to work for sisl intrinsics, you would have to duplicate interfaces in pxd files, and you wouldn't be able to run that code in pure python mode, which kind of defeats the purpose. :S

See here: http://docs.cython.org/en/latest/src/tutorial/pure.html#cimports

@pfebrer
Copy link
Contributor

pfebrer commented Nov 7, 2024

Well, of course it depends on how complex things are, but you could always mock the cimports inside an if cython.compiled: clause so that the file runs in the python interpreter.

But apart from files running out of the box in normal python (which is nice), for me it is a question of readability and the feeling that the code can be ported to something else much more easily if there is a better accelerating framework than cython.

Not saying we should translate everything to pure python, just giving some arguments in favour of the pure python syntax :)

@zerothi zerothi force-pushed the cython-perf-simplification branch from 5014eac to 295c413 Compare November 8, 2024 12:38
src/sisl/io/siesta/_help.py Fixed Show fixed Hide fixed
@@ -98,45 +99,177 @@
csr.translate_columns(col_from, col_to)


def _mat_spin_convert(M, spin=None):
def _mat2dtype(M, dtype: np.dtype) -> None:

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
src/sisl/io/siesta/tests/test_tsde.py Fixed Show fixed Hide fixed
This was not really intended. But the amount
of different codes that were not using fused tyes
was annoying and prohibited further development.

I initially tried to implement pure-python mode
codes. However, it seems to be impossible to do
pure-python mode and cimport it into another
pure-python mode code.

Before this will be merged, I need to benchmark
the changes made.

I *hope* it is faster, but I am not sure anything
happened.

The main change is that the fused datatypes
are omnipresent and that the ndarray data
type has been removed almost completely.
So now, the memoryviews are prevalent.

Signed-off-by: Nick Papior <[email protected]>
This means that it is much simpler to track problems.
There is no code duplication in the spin-box calculations.

I am not sure whether the python-yellow annotations are
only for the int/long variables, and when down-casting.
In any case, I think this is more easy to manage.

Signed-off-by: Nick Papior <[email protected]>
Simple benchmarks showed that siesta Hamiltonians
to Hk can be much faster by changing how the
folding of the matrices are done.

Instead of incrementally adding elements, and searching
for duplicates before each addition of elements, we know
built the entire array, and use numpy.unique to reduce
the actual array. This leverages the numpy unique
function which already returns a sorted array.

It marginally slows down csr creation of matrices with
few edges per orbital (TB models). But will be
much faster for larger models stemming from DFT
or the likes.

Tests for this commit:

    %timeit H.Hk()
    %timeit H.Hk([0.1] * 3)
    %timeit H.Hk(format="array")
    %timeit H.Hk([0.1] * 3, format="array")

For a *many* edge system, we get:

    67.2 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    85.4 ms ± 8.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    5.59 ms ± 426 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    11.3 ms ± 39.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

While for a *few* edge system, we get:

    9.1 ms ± 52.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    9.25 ms ± 65.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    5.75 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    6.17 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

For commit v0.15.1-57-g6bbbde39 we get:

    196 ms ± 3.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    214 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    6.58 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    12.8 ms ± 58.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

and

    7.41 ms ± 77.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    7.37 ms ± 73.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    6.04 ms ± 383 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    5.81 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Signed-off-by: Nick Papior <[email protected]>
Enabling fortran is necessary for it to populate details of the fortran
world.

Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
This should enable simpler access to data

Signed-off-by: Nick Papior <[email protected]>
Lots of the code were built for floats.
This now fixes the issue of reading/writing
sparse matrices in specific data-formats.

It allows a more natural way of handling SOC
matrices, with complex interplay, say:

 H[0, 0, 2] = Hud

as a complex variable, when dealing with floats
one needs to do this:

 H[0, 0, 2] = Hud.real
 H[0, 0, 3] = Hud.imag

which is not super-intuitive.

Currently there are still *many* hardcodings of the indices.

And we should strive to move these into a common framework
to limit the problems it creates.

Tests has been added that checks Hamiltonian eigenvalues
and Density matrices mulliken charges. So it seems it works
as intended, but not everything has been fully tested.

Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
@zerothi zerothi force-pushed the cython-perf-simplification branch from e23989a to f6df3fb Compare November 25, 2024 14:32
fixed errors in col creation of dtypes when calling
csr.diags()

Ensured that sparsegeometry.finalize accepts any arguments the
csr.finalize accepts.

Removed casting errors in mat2dtype. This may *hide* potential
problems when there are non-zero imaginary parts.
We should perhaps later revisit the problem considering
that TB + Peierls has complex overlap matrices.

Fixed issue when a construct was called with Python
intrinsic complex values.

Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
@zerothi zerothi merged commit c6253c8 into main Nov 26, 2024
16 of 17 checks passed
@zerothi zerothi deleted the cython-perf-simplification branch November 26, 2024 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pk on complex matrices
2 participants